Connect To Drive

Reading required files

Reading Train Annotations

Annotation DF contains bounding box co ordinates and image class, this if for train data

Reading Test Annotations

Annotation DF contains bounding box co ordinates and image class, this if for test data

Image Segregation

Using Split-folder to create train and validation files

Below cells are commented, we need to create 2 directories, one for train and other for validation. These images will be used for training and validation. we use split folder libraries to get 2 directories with 80-20 split and get the images copied to each of folder

The below lines are commented since we don't have to run this every time we start the notebook. Should be uncommented it the notebook is run for first time. Approrpiately change the directories names as you start running the note book

In below cell we have copied the images from shortcut and create 2 directories as required.

using split-folder library to split the data

Creating data required for training

Creating list containing files and folder names for training

Above cell contains code to generate a list containing car names and folder path for Train related data

Creating list containing files and folder names for Validation

Above cell contains code to generate a list containing car names and folder path for validation related data

Creating list containing files and folder names for testing

Above cell contains code to generate a list containing car names and folder path for test related data

Creating Training dataframe

Above cell contains code to generate a dataframe from train list create in above cells

Creating Validation dataframe

Above cell contains code to generate a dataframe from validation list

Creating Testing dataframe

Above cell contains code to generate a dataframe from test list

Merging train dataframe with annotations

Here we are merging the code to create a unified dataframe.

For Preprocessing images, below mentioned simple 3 steps are followed.

Reading images from images folder

Reading annotation file

Extracting image height and width

Merging image annotation dataset with data produced in above 2 steps.

The same is done for validation and test in below few cells

Merging Validation dataframe with annotations

Merging test dataframe with annotations

Checking basic details of dataframe

Label encoding the car names for classification purpose

we are doing encoding on the car names and we will be using it for train purpose. We need this as we observed a problem in the "Class" column in the annotation dataframe.

Extracting the height and width of the images from Train set

In below cells we are calculating the height and width of each image.

This particular process is time consuming.

So we are planning to do it one time, save it in a CSV file and use it whenever needed.

If you are running this notebook for first time then please uncomment this part and then save the file. This will atleast take 1 hour for train data and one more hour for test data

Extracting the height and width of the images from Test set

Creating Train and validation dataframe with width and height

In above cell we have unified all the things which are needed and created one dataframe which is going to be used for training.

In above cell we have unified all the things which are needed and created one dataframe which is going to be used for validation.

Creating Test dataframe with width and height

In above cell we have unified all the things which are needed and created one dataframe which is going to be used for testing.

EDA

Train dataframe : Above table shows the class of cars which are having more number of images.

Train dataframe : Above table shows the class of cars which are having less number of images.

Test dataframe : Above table shows the class of cars which are having more number of images.

Test dataframe : Above table shows the class of cars which are having less number of images.

Above cell show which are unique carModel (year of make) we are having in the given dataset

Below mentioned picture depicts the number of car images based on car model year. There are more images of cars which are of model 2012 and very less images belonging to model 1996.

Almost all classes of cars images have an equal number of images which is of count around 40. There is a class named “GMC savana van 2012” which has 60 images and “Hyundai Accent sedan 2012” which has around 25 images. In below mentioned pictures we can view the class distribution.

Here we are planning to check if all instances of a images share the same location and orientation within the dataset. As it is observed from the graph the location of car within image varies and there are significant count of images where appearance of car is too small compared to the image itself.

Above cell shows the image which is of maximum size

Corresponding image is shown in below cell

Train Data:Above cell shows the image which is of Minimum size

Corresponding image is shown in below cell. This one of the sample image.

Test Data:Above cell shows the image which is of Maximum size

Corresponding image is shown in below cell. This one of the sample image.

Test Data:Above cell shows the image which is of Minimum size

Corresponding image is shown in below cell. This one of the sample image.

Above cell shows overall brightness for one of the same data in train dataset

Display sample data with bounding boxes

Display sample data for Train set images

Display sample data for Test set images

Display sample data for Validation set images

Checking number of classes in each dataframe

Model building for Efficient Net B7

Build a Batch Generator

Checking the output of the batch generator

Load Pre-Trained Model

Un-Freeze Few layers of Pre-trained model

Add Final layers to the model

Build layer for Classification Label output

Build layer for bounding box output

Finalize the model

Define function to calculate IoU

Train the model

Model Prediction

The output of above cell is cleared since it was having images and was consuming lot of space, leading to a huge size of the notebook.

Above is the classification report

Above mentioned table is the classification report for EfficientNet-B5 model. Overall Accuracy is 79%, Precision is 78%, recall is 78% and f1-Score is 78%. As observed in the classification report, for a few classes the number is low and the same is observed in test samples where the model is showing misclassification. The number of samples used for producing reports is 8041. In EfficientNet, the authors propose a new Scaling method called Compound Scaling.

Model building for Efficient Net B5

Build a Batch Generator

Checking the output of the batch generator

Load Pre-Trained Model

Un-Freeze Few layers of Pre-trained model

Add Final layers to the model

Build layer for Classification Label output

Build layer for bounding box output

Finalize the model

Define function to calculate IoU

Train the model

Model Prediction

The output of above cell is cleared since it was having images and was consuming lot of space, leading to a huge size of the notebook.

Above is the classification report

Below mentioned table classification report for EfficientNet-B5 model. Overall Accuracy is 75%, Precision is 75%, recall is 74% and f1-Score is 74%. As observed in the classification report, for a few classes the number is low and the same is observed in test samples where the model is showing misclassification. The number of samples used for producing reports is 8041. In EfficientNet, the authors propose a new Scaling method called Compound Scaling.

Model building for ResNet

Build a Batch Generator

Checking the output of the batch generator

Build the Model

Load Pre-Trained Model

Un-Freeze Few layers of Pre-trained model

Add Final layers to the model

Build layer for Classification Label output

Build layer for bounding box output

Finalize the model

Define function to calculate IoU

Train the model

Model Prediction

The output of above cell is cleared since it was having images and was consuming lot of space, leading to a huge size of the notebook.

Above is the classification report

Below mentioned table classification report is for Resnet50 model. Overall Accuracy is low for this model. The problem observed for ResNet50 is that the network doesn’t learn well. ResNet follows the conventional approach of scaling the dimensions arbitrarily and by adding up more and more layers. As observed, ResNet is not suitable for this use case. There are misclassifications observed and bounding boxes which are overlaid by model on the image are misaligned. The number of samples used for producing reports is 8041.

Model building for Mobile Net

Build a Batch Generator

Checking the output of the batch generator

Build the Model

Load Pre-Trained Model

Un-Freeze Few layers of Pre-trained model

Add Final layers to the model

Build layer for Classification Label output

Build layer for bounding box output

Finalize the model

Define function to calculate IoU

Train the model

Model Prediction

The output of above cell is cleared since it was having images and was consuming lot of space, leading to a huge size of the notebook.

Above is the classification report Below mentioned table classification report for MobileNet model. Overall Accuracy is low for this model. The problem observed for MobileNet is that the network is a small network and not suitable for this use case, the number of weights which are to be learnt is very less leading to misclassification and bounding boxes which are overlaid by model on the image are misaligned. The number of samples used for producing reports is 8041

Conclusion

As we have observed in above experiments with different CNN models, EfficientNet-b7 is the one which is provide good accuracy and regression output compare to other models. EfficientNet is a convolutional neural network architecture and scaling method that uniformly scales all dimensions of depth/width/resolution using a compound coefficient. Unlike conventional practice that arbitrary scales these factors, the EfficientNet scaling method uniformly scales network width, depth, and resolution with a set of fixed scaling coefficients.

Efficient-B7 is having 813 layers and there are aorund 71 million weights to be learned, we have trained here one 67 million parameter and these learning is from layer 357 and later. These later layers are the one which learns in depth about the images and thus have helped to get a good trained model.

We are using categorical_crossentropy for classification and mse for regression. We calculate the overall loss of the model by adding regression loss and classification loss. For calculating accuracy of the bounding box on the images, we have defined a function to calculate IOU. This Intersection over Union is an evaluation metric used to measure the accuracy of an object detector on a particular dataset. Efficientnet-b7 has 813 total layers, from layer number 351 we have made it trainable. This model has 64 Million trainable parameters. The Last layer of the model has GlobalAveragePooling2D, 1 Dense layer and a Batch normalization layer. For classification we have used softmax activation function with 196 classes and for regression we have used sigmoid activation function for 4 coordinates of the bounding box.

Here we have chosen EfficientNet-B7 and since this is an interim submission, we are planning to continue our experiment with other networks and methodologies. We are trying to build YOLO and we are also using the TFOD framework to check which would give the best result. We also started with efficientDet. For final submission we will compare all the models and get the best model deployed and produce a result in a nice user interface.

References https://paperswithcode.com/method/efficientnet